Multi-dimensional surround view based search

ABSTRACT

According to various examples, a surround view can be used as a visual search query for an object to be searched and this surround view can be compared to a database of three dimensional models. A determination can then be made about whether any of the three dimensional models match the visual search query. Based on how closely the three dimensional models match the visual search query, matching objects can be provided in various formats such as ranked lists.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. §119 to ProvisionalU.S. Patent Application No. 61/903,359 (Attorney Docket No. FYSNP001P)by Holzer et al., filed on Nov. 12, 2013, titled “Systems and Methods FrProviding Surround Views”, which is incorporated by reference herein inits entirety and for all purposes.

TECHNICAL FIELD

The present disclosure relates to generating surround views, whichincludes providing a multi-view interactive digital mediarepresentation.

DESCRIPTION OF RELATED ART

With modern computing platforms and technologies shifting towards mobileand wearable devices that include camera sensors as native acquisitioninput streams, the desire to record and preserve moments digitally in adifferent form than more traditional two-dimensional (2D) flat imagesand videos has become more apparent. Traditional digital media formatstypically limit their viewers to a passive experience. For instance, a2D flat image can be viewed from one angle and is limited to zooming inand out. Accordingly, traditional digital media formats, such as 2D flatimages, do not easily lend themselves to reproducing memories and eventswith high fidelity.

Current predictions (Ref: KPCB “Internet Trends 2012” presentation”)indicate that every several years the quantity of visual data that isbeing captured digitally online will double. As this quantity of visualdata increases, so does the need for much more comprehensive search andindexing mechanisms than ones currently available. Unfortunately,neither 2D images nor 2D videos have been designed for these purposes.Accordingly, improved mechanisms that allow users to view and indexvisual data, as well as query and quickly receive meaningful resultsfrom visual data are desirable.

Overview

According to various examples, a surround view can be used as a visualsearch query for an object to be searched and this surround view can becompared to a database of three dimensional models. A determination canthen be made about whether any of the three dimensional models match thevisual search query. Based on how closely the three dimensional modelsmatch the visual search query, matching objects can be provided invarious formats such as ranked lists.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure may best be understood by reference to the followingdescription taken in conjunction with the accompanying drawings, whichillustrate particular embodiments of the present invention.

FIG. 1 illustrates an example of a surround view acquisition system.

FIG. 2 illustrates an example of a process flow for generating asurround view.

FIG. 3 illustrates one example of multiple camera views that can befused into a three-dimensional (3D) model to create an immersiveexperience.

FIG. 4 illustrates one example of separation of content and context in asurround view.

FIGS. 5A-5B illustrate examples of concave view and convex views,respectively, where both views use a back-camera capture style.

FIGS. 6A-6E illustrate examples of various capture modes for surroundviews.

FIG. 7 illustrates one example of a process for recording data that canbe used to generate a surround view.

FIG. 8 illustrates an example of a surround view in whichthree-dimensional content is blended with a two-dimensional panoramiccontext.

FIG. 9 illustrates one example of a space-time surround view beingsimultaneously recorded by independent observers.

FIG. 10 illustrates one example of separation of a complex surround-viewinto smaller, linear parts.

FIG. 11 illustrates one example of a combination of multiple surroundviews into a multi-surround view.

FIG. 12 illustrates one example of a process for prompting a user foradditional views of an object of interest to provide a more accuratesurround view.

FIGS. 13A-13B illustrate an example of prompting a user for additionalviews of an object to be searched.

FIG. 14 illustrates one example of a process for navigating a surroundview.

FIG. 15 illustrates an example of swipe-based navigation of a surroundview.

FIG. 16A illustrates examples of a sharing service for surround views,as shown on a mobile device and browser.

FIG. 16B illustrates examples of surround view-related notifications ona mobile device.

FIG. 17A illustrates one example of a process for providing objectsegmentation.

FIG. 17B illustrates one example of a segmented object viewed fromdifferent angles.

FIG. 18 illustrates one example of various data sources that can be usedfor surround view generation and various applications that can be usedwith a surround view.

FIG. 19 illustrates one example of a process for providing visual searchof an object, where the search query includes a surround view of theobject and the data searched includes three-dimensional models.

FIG. 20 illustrates one example of a process for providing visual searchof an object, where the search query includes a surround view of theobject and the data searched includes two-dimensional images.

FIG. 21 illustrates an example of a visual search process.

FIG. 22 illustrates an example of a process for providing visual searchof an object, where the search query includes a two-dimensional view ofthe object and the data searched includes surround view(s).

FIG. 23 illustrates a particular example of a computer system that canbe used with various embodiments of the present invention.

DETAILED DESCRIPTION

Reference will now be made in detail to some specific examples of theinvention including the best modes contemplated by the inventors forcarrying out the invention. Examples of these specific embodiments areillustrated in the accompanying drawings. While the present disclosureis described in conjunction with these specific embodiments, it will beunderstood that it is not intended to limit the invention to thedescribed embodiments. On the contrary, it is intended to coveralternatives, modifications, and equivalents as may be included withinthe spirit and scope of the invention as defined by the appended claims.

In the following description, numerous specific details are set forth inorder to provide a thorough understanding of the present invention.Particular embodiments of the present invention may be implementedwithout some or all of these specific details. In other instances, wellknown process operations have not been described in detail in order notto unnecessarily obscure the present invention.

Various aspects of the present invention relate generally to systems andmethods for analyzing the spatial relationship between multiple cameraimages and video streams together with location information data, forthe purpose of creating a single representation, a surround view, whicheliminates redundancy in the data, and presents a user with aninteractive and immersive active viewing experience. According tovarious embodiments, active is described in the context of providing auser with the ability to control the viewpoint of the visual informationdisplayed on a screen. In particular example embodiments, the surroundview data structure (and associated algorithms) is natively built for,but not limited to, applications involving visual search.

According to various embodiments of the present invention, a surroundview is a multi-view interactive digital media representation. Withreference to FIG. 1, shown is one example of a surround view acquisitionsystem 100. In the present example embodiment, the surround viewacquisition system 100 is depicted in a flow sequence that can be usedto generate a surround view. According to various embodiments, the dataused to generate a surround view can come from a variety of sources. Inparticular, data such as, but not limited to two-dimensional (2D) images104 can be used to generate a surround view. These 2D images can includecolor image data streams such as multiple image sequences, video data,etc., or multiple images in any of various formats for images, dependingon the application. Another source of data that can be used to generatea surround view includes location information 106. This locationinformation 106 can be obtained from sources such as accelerometers,gyroscopes, magnetometers, GPS, WiFi, IMU-like systems (InertialMeasurement Unit systems), and the like. Yet another source of data thatcan be used to generate a surround view can include depth images 108.These depth images can include depth, 3D, or disparity image datastreams, and the like, and can be captured by devices such as, but notlimited to, stereo cameras, time-of-flight cameras, three-dimensionalcameras, and the like.

In the present example embodiment, the data can then be fused togetherat sensor fusion block 110. In some embodiments, a surround view can begenerated a combination of data that includes both 2D images 104 andlocation information 106, without any depth images 108 provided. Inother embodiments, depth images 108 and location information 106 can beused together at sensor fusion block 110. Various combinations of imagedata can be used with location information at 106, depending on theapplication and available data.

In the present example embodiment, the data that has been fused togetherat sensor fusion block 110 is then used for content modeling 112 andcontext modeling 114. As described in more detail with regard to FIG. 4,the subject matter featured in the images can be separated into contentand context. The content can be delineated as the object of interest andthe context can be delineated as the scenery surrounding the object ofinterest. According to various embodiments, the content can be athree-dimensional model, depicting an object of interest, although thecontent can be a two-dimensional image in some embodiments, as describedin more detail below with regard to FIG. 4. Furthermore, in someembodiments, the context can be a two-dimensional model depicting thescenery surrounding the object of interest. Although in many examplesthe context can provide two-dimensional views of the scenery surroundingthe object of interest, the context can also include three-dimensionalaspects in some embodiments. For instance, the context can be depictedas a “flat” image along a cylindrical “canvas,” such that the “flat”image appears on the surface of a cylinder. In addition, some examplesmay include three-dimensional context models, such as when some objectsare identified in the surrounding scenery as three-dimensional objects.According to various embodiments, the models provided by contentmodeling 112 and context modeling 114 can be generated by combining theimage and location information data, as described in more detail withregard to FIG. 3.

According to various embodiments, context and content of a surround vieware determined based on a specified object of interest. In someexamples, an object of interest is automatically chosen based onprocessing of the image and location information data. For instance, ifa dominant object is detected in a series of images, this object can beselected as the content. In other examples, a user specified target 102can be chosen, as shown in FIG. 1. It should be noted, however, that asurround view can be generated without a user specified target in someapplications.

In the present example embodiment, one or more enhancement algorithmscan be applied at enhancement algorithm(s) block 116. In particularexample embodiments, various algorithms can be employed during captureof surround view data, regardless of the type of capture mode employed.These algorithms can be used to enhance the user experience. Forinstance, automatic frame selection, stabilization, view interpolation,filters, and/or compression can be used during capture of surround viewdata. In some examples, these enhancement algorithms can be applied toimage data after acquisition of the data. In other examples, theseenhancement algorithms can be applied to image data during capture ofsurround view data.

According to particular example embodiments, automatic frame selectioncan be used to create a more enjoyable surround view. Specifically,frames are automatically selected so that the transition between themwill be smoother or more even. This automatic frame selection canincorporate blur- and overexposure-detection in some applications, aswell as more uniformly sampling poses such that they are more evenlydistributed.

In some example embodiments, stabilization can be used for a surroundview in a manner similar to that used for video. In particular,keyframes in a surround view can be stabilized for to produceimprovements such as smoother transitions, improved/enhanced focus onthe content, etc. However, unlike video, there are many additionalsources of stabilization for a surround view, such as by using IMUinformation, depth information, computer vision techniques, directselection of an area to be stabilized, face detection, and the like.

For instance, IMU information can be very helpful for stabilization. Inparticular, IMU information provides an estimate, although sometimes arough or noisy estimate, of the camera tremor that may occur duringimage capture. This estimate can be used to remove, cancel, and/orreduce the effects of such camera tremor.

In some examples, depth information, if available, can be used toprovide stabilization for a surround view. Because points of interest ina surround view are three-dimensional, rather than two-dimensional,these points of interest are more constrained and tracking/matching ofthese points is simplified as the search space reduces. Furthermore,descriptors for points of interest can use both color and depthinformation and therefore, become more discriminative. In addition,automatic or semi-automatic content selection can be easier to providewith depth information. For instance, when a user selects a particularpixel of an image, this selection can be expanded to fill the entiresurface that touches it. Furthermore, content can also be selectedautomatically by using a foreground/background differentiation based ondepth. In various examples, the content can stay relativelystable/visible even when the context changes.

According to various examples, computer vision techniques can also beused to provide stabilization for surround views. For instance,keypoints can be detected and tracked. However, in certain scenes, suchas a dynamic scene or static scene with parallax, no simple warp existsthat can stabilize everything. Consequently, there is a trade-off inwhich certain aspects of the scene receive more attention tostabilization and other aspects of the scene receive less attention.Because a surround view is often focused on a particular object ofinterest, a surround view can be content-weighted so that the object ofinterest is maximally stabilized in some examples.

Another way to improve stabilization in a surround view includes directselection of a region of a screen. For instance, if a user taps to focuson a region of a screen, then records a convex surround view, the areathat was tapped can be maximally stabilized. This allows stabilizationalgorithms to be focused on a particular area or object of interest.

In some examples, face detection can be used to provide stabilization.For instance, when recording with a front-facing camera, it is oftenlikely that the user is the object of interest in the scene. Thus, facedetection can be used to weight stabilization about that region. Whenface detection is precise enough, facial features themselves (such aseyes, nose, mouth) can be used as areas to stabilize, rather than usinggeneric keypoints.

According to various examples, view interpolation can be used to improvethe viewing experience. In particular, to avoid sudden “jumps” betweenstabilized frames, synthetic, intermediate views can be rendered on thefly. This can be informed by content-weighted keypoint tracks and IMUinformation as described above, as well as by denser pixel-to-pixelmatches. If depth information is available, fewer artifacts resultingfrom mismatched pixels may occur, thereby simplifying the process. Asdescribed above, view interpolation can be applied during capture of asurround view in some embodiments. In other embodiments, viewinterpolation can be applied during surround view generation.

In some examples, filters can also be used during capture or generationof a surround view to enhance the viewing experience. Just as manypopular photo sharing services provide aesthetic filters that can beapplied to static, two-dimensional images, aesthetic filters cansimilarly be applied to surround images. However, because a surroundview representation is more expressive than a two-dimensional image, andthree-dimensional information is available in a surround view, thesefilters can be extended to include effects that are ill-defined in twodimensional photos. For instance, in a surround view, motion blur can beadded to the background (i.e. context) while the content remains crisp.In another example, a drop-shadow can be added to the object of interestin a surround view.

In various examples, compression can also be used as an enhancementalgorithm 116. In particular, compression can be used to enhanceuser-experience by reducing data upload and download costs. Becausesurround views use spatial information, far less data can be sent for asurround view than a typical video, while maintaining desired qualitiesof the surround view. Specifically, the IMU, keypoint tracks, and userinput, combined with the view interpolation described above, can allreduce the amount of data that must be transferred to and from a deviceduring upload or download of a surround view. For instance, if an objectof interest can be properly identified, a variable compression style canbe chosen for the content and context. This variable compression stylecan include lower quality resolution for background information (i.e.context) and higher quality resolution for foreground information (i.e.content) in some examples. In such examples, the amount of datatransmitted can be reduced by sacrificing some of the context quality,while maintaining a desired level of quality for the content.

In the present embodiment, a surround view 118 is generated after anyenhancement algorithms are applied. The surround view can provide amulti-view interactive digital media representation. In variousexamples, the surround view can include three-dimensional model of thecontent and a two-dimensional model of the context. However, in someexamples, the context can represent a “flat” view of the scenery orbackground as projected along a surface, such as a cylindrical orother-shaped surface, such that the context is not purelytwo-dimensional. In yet other examples, the context can includethree-dimensional aspects.

According to various embodiments, surround views provide numerousadvantages over traditional two-dimensional images or videos. Some ofthese advantages include: the ability to cope with moving scenery, amoving acquisition device, or both; the ability to model parts of thescene in three-dimensions; the ability to remove unnecessary, redundantinformation and reduce the memory footprint of the output dataset; theability to distinguish between content and context; the ability to usethe distinction between content and context for improvements in theuser-experience; the ability to use the distinction between content andcontext for improvements in memory footprint (an example would be highquality compression of content and low quality compression of context);the ability to associate special feature descriptors with surround viewsthat allow the surround views to be indexed with a high degree ofefficiency and accuracy; and the ability of the user to interact andchange the viewpoint of the surround view. In particular exampleembodiments, the characteristics described above can be incorporatednatively in the surround view representation, and provide the capabilityfor use in various applications. For instance, surround views can beused to enhance various fields such as e-commerce, visual search, 3Dprinting, file sharing, user interaction, and entertainment.

According to various example embodiments, once a surround view 118 isgenerated, user feedback for acquisition 120 of additional image datacan be provided. In particular, if a surround view is determined to needadditional views to provide a more accurate model of the content orcontext, a user may be prompted to provide additional views. Once theseadditional views are received by the surround view acquisition system100, these additional views can be processed by the system 100 andincorporated into the surround view.

With reference to FIG. 2, shown is an example of a process flow diagramfor generating a surround view 200. In the present example, a pluralityof images is obtained at 202. According to various embodiments, theplurality of images can include two-dimensional (2D) images or datastreams. These 2D images can include location information that can beused to generate a surround view. In some embodiments, the plurality ofimages can include depth images 108, as also described above with regardto FIG. 1. The depth images can also include location information invarious examples.

According to various embodiments, the plurality of images obtained at202 can include a variety of sources and characteristics. For instance,the plurality of images can be obtained from a plurality of users. Theseimages can be a collection of images gathered from the internet fromdifferent users of the same event, such as 2D images or video obtainedat a concert, etc. In some examples, the plurality of images can includeimages with different temporal information. In particular, the imagescan be taken at different times of the same object of interest. Forinstance, multiple images of a particular statue can be obtained atdifferent times of day, different seasons, etc. In other examples, theplurality of images can represent moving objects. For instance, theimages may include an object of interest moving through scenery, such asa vehicle traveling along a road or a plane traveling through the sky.In other instances, the images may include an object of interest that isalso moving, such as a person dancing, running, twirling, etc.

In the present example embodiment, the plurality of images is fused intocontent and context models at 204. According to various embodiments, thesubject matter featured in the images can be separated into content andcontext. The content can be delineated as the object of interest and thecontext can be delineated as the scenery surrounding the object ofinterest. According to various embodiments, the content can be athree-dimensional model, depicting an object of interest, and thecontent can be a two-dimensional image in some embodiments.

According to the present example embodiment, one or more enhancementalgorithms can be applied to the content and context models at 206.These algorithms can be used to enhance the user experience. Forinstance, enhancement algorithms such as automatic frame selection,stabilization, view interpolation, filters, and/or compression can beused. In some examples, these enhancement algorithms can be applied toimage data during capture of the images. In other examples, theseenhancement algorithms can be applied to image data after acquisition ofthe data.

In the present embodiment, a surround view is generated from the contentand context models at 208. The surround view can provide a multi-viewinteractive digital media representation. In various examples, thesurround view can include a three-dimensional model of the content and atwo-dimensional model of the context. According to various embodiments,depending on the mode of capture and the viewpoints of the images, thesurround view model can include certain characteristics. For instance,some examples of different styles of surround views include a locallyconcave surround view, a locally convex surround view, and a locallyflat surround view. However, it should be noted that surround views caninclude combinations of views and characteristics, depending on theapplication.

With reference to FIG. 3, shown is one example of multiple camera viewsthat can be fused together into a three-dimensional (3D) model to createan immersive experience. According to various embodiments, multipleimages can be captured from various viewpoints and fused together toprovide a surround view. In the present example embodiment, threecameras 312, 314, and 316 are positioned at locations 322, 324, and 326,respectively, in proximity to an object of interest 308. Scenery cansurround the object of interest 308 such as object 310. Views 302, 304,and 306 from their respective cameras 312, 314, and 316 includeoverlapping subject matter. Specifically, each view 302, 304, and 306includes the object of interest 308 and varying degrees of visibility ofthe scenery surrounding the object 310. For instance, view 302 includesa view of the object of interest 308 in front of the cylinder that ispart of the scenery surrounding the object 310. View 306 shows theobject of interest 308 to one side of the cylinder, and view 304 showsthe object of interest without any view of the cylinder.

In the present example embodiment, the various views 302, 304, and 316along with their associated locations 322, 324, and 326, respectively,provide a rich source of information about object of interest 308 andthe surrounding context that can be used to produce a surround view. Forinstance, when analyzed together, the various views 302, 304, and 326provide information about different sides of the object of interest andthe relationship between the object of interest and the scenery.According to various embodiments, this information can be used to parseout the object of interest 308 into content and the scenery as thecontext. Furthermore, as also described above with regard to FIGS. 1 and2, various algorithms can be applied to images produced by theseviewpoints to create an immersive, interactive experience when viewing asurround view.

FIG. 4 illustrates one example of separation of content and context in asurround view. According to various embodiments of the presentinvention, a surround view is a multi-view interactive digital mediarepresentation of a scene 400. With reference to FIG. 4, shown is a user402 located in a scene 400. The user 402 is capturing images of anobject of interest, such as a statue. The images captured by the userconstitute digital visual data that can be used to generate a surroundview.

According to various embodiments of the present disclosure, the digitalvisual data included in a surround view can be, semantically and/orpractically, separated into content 404 and context 406. According toparticular embodiments, content 404 can include the object(s),person(s), or scene(s) of interest while the context 406 represents theremaining elements of the scene surrounding the content 404. In someexamples, a surround view may represent the content 404 asthree-dimensional data, and the context 406 as a two-dimensionalpanoramic background. In other examples, a surround view may representboth the content 404 and context 406 as two-dimensional panoramicscenes. In yet other examples, content 404 and context 406 may includethree-dimensional components or aspects. In particular embodiments, theway that the surround view depicts content 404 and context 406 dependson the capture mode used to acquire the images.

In some examples, such as but not limited to: recordings of objects,persons, or parts of objects or persons, where only the object, person,or parts of them are visible, recordings of large flat areas, andrecordings of scenes where the data captured appears to be at infinity(i.e., there are no subjects close to the camera), the content 404 andthe context 406 may be the same. In these examples, the surround viewproduced may have some characteristics that are similar to other typesof digital media such as panoramas. However, according to variousembodiments, surround views include additional features that distinguishthem from these existing types of digital media. For instance, asurround view can represent moving data. Additionally, a surround viewis not limited to a specific cylindrical, spherical or translationalmovement. Various motions can be used to capture image data with acamera or other capture device. Furthermore, unlike a stitched panorama,a surround view can display different sides of the same object.

FIGS. 5A-5B illustrate examples of concave and convex views,respectively, where both views use a back-camera capture style. Inparticular, if a camera phone is used, these views use the camera on theback of the phone, facing away from the user. In particular embodiments,concave and convex views can affect how the content and context aredesignated in a surround view.

With reference to FIG. 5A, shown is one example of a concave view 500 inwhich a user is standing along a vertical axis 508. In this example, theuser is holding a camera, such that camera location 502 does not leaveaxis 508 during image capture. However, as the user pivots about axis508, the camera captures a panoramic view of the scene around the user,forming a concave view. In this embodiment, the object of interest 504and the distant scenery 506 are all viewed similarly because of the wayin which the images are captured. In this example, all objects in theconcave view appear at infinity, so the content is equal to the contextaccording to this view.

With reference to FIG. 5B, shown is one example of a convex view 520 inwhich a user changes position when capturing images of an object ofinterest 524. In this example, the user moves around the object ofinterest 524, taking pictures from different sides of the object ofinterest from camera locations 528, 530, and 532. Each of the imagesobtained includes a view of the object of interest, and a background ofthe distant scenery 526. In the present example, the object of interest524 represents the content, and the distant scenery 526 represents thecontext in this convex view.

FIGS. 6A-6E illustrate examples of various capture modes for surroundviews. Although various motions can be used to capture a surround viewand are not constrained to any particular type of motion, three generaltypes of motion can be used to capture particular features or viewsdescribed in conjunction surround views. These three types of motion,respectively, can yield a locally concave surround view, a locallyconvex surround view, and a locally flat surround view. In someexamples, a surround view can include various types of motions withinthe same surround view.

With reference to FIG. 6A, shown is an example of a back-facing, concavesurround view being captured. According to various embodiments, alocally concave surround view is one in which the viewing angles of thecamera or other capture device diverge. In one dimension this can belikened to the motion required to capture a spherical 360 panorama (purerotation), although the motion can be generalized to any curved sweepingmotion in which the view faces outward. In the present example, theexperience is that of a stationary viewer looking out at a (possiblydynamic) context.

In the present example embodiment, a user 602 is using a back-facingcamera 606 to capture images towards world 600, and away from user 602.As described in various examples, a back-facing camera refers to adevice with a camera that faces away from the user, such as the cameraon the back of a smart phone. The camera is moved in a concave motion608, such that views 604 a, 604 b, and 604 c capture various parts ofcapture area 609.

With reference to FIG. 6B, shown is an example of a back-facing, convexsurround view being captured. According to various embodiments, alocally convex surround view is one in which viewing angles convergetoward a single object of interest. In some examples, a locally convexsurround view can provide the experience of orbiting about a point, suchthat a viewer can see multiple sides of the same object. This object,which may be an “object of interest,” can be segmented from the surroundview to become the content, and any surrounding data can be segmented tobecome the context. Previous technologies fail to recognize this type ofviewing angle in the media-sharing landscape.

In the present example embodiment, a user 602 is using a back-facingcamera 614 to capture images towards world 600, and away from user 602.The camera is moved in a convex motion 610, such that views 612 a, 612b, and 612 c capture various parts of capture area 611. As describedabove, world 600 can include an object of interest in some examples, andthe convex motion 610 can orbit around this object. Views 612 a, 612 b,and 612 c can include views of different sides of this object in theseexamples.

With reference to FIG. 6C, shown is an example of a front-facing,concave surround view being captured. As described in various examples,a front-facing camera refers to a device with a camera that facestowards the user, such as the camera on the front of a smart phone. Forinstance, front-facing cameras are commonly used to take “selfies”(i.e., self-portraits of the user).

In the present example embodiment, camera 620 is facing user 602. Thecamera follows a concave motion 606 such that the views 618 a, 618 b,and 618 c diverge from each other in an angular sense. The capture area617 follows a concave shape that includes the user at a perimeter.

With reference to FIG. 6D, shown is an example of a front-facing, convexsurround view being captured. In the present example embodiment, camera626 is facing user 602. The camera follows a convex motion 622 such thatthe views 624 a, 624 b, and 624 c converge towards the user 602. Thecapture area 617 follows a concave shape that surrounds the user 602.

With reference to FIG. 6E, shown is an example of a back-facing, flatview being captured. In particular example embodiments, a locally flatsurround view is one in which the rotation of the camera is smallcompared to its translation. In a locally flat surround view, theviewing angles remain roughly parallel, and the parallax effectdominates. In this type of surround view, there can also be an “objectof interest”, but its position does not remain fixed in the differentviews. Previous technologies also fail to recognize this type of viewingangle in the media-sharing landscape.

In the present example embodiment, camera 632 is facing away from user602, and towards world 600. The camera follows a generally linear motion628 such that the capture area 629 generally follows a line. The views630 a, 630 b, and 630 c have generally parallel lines of sight. Anobject viewed in multiple views can appear to have different or shiftedbackground scenery in each view. In addition, a slightly different sideof the object may be visible in different views. Using the parallaxeffect, information about the position and characteristics of the objectcan be generated in a surround view that provides more information thanany one static image.

As described above, various modes can be used to capture images for asurround view. These modes, including locally concave, locally convex,and locally linear motions, can be used during capture of separateimages or during continuous recording of a scene. Such recording cancapture a series of images during a single session.

According to various embodiments of the present invention, a surroundview can be generated from data acquired in numerous ways. FIG. 7illustrates one example of process for recording data that can be usedto generate a surround view. In this example, data is acquired by movinga camera through space. In particular, a user taps a record button 702on a capture device 700 to begin recording. As movement of the capturedevice 716 follows a generally leftward direction, an object 714 movesin a generally rightward motion across the screen, as indicated bymovement of object 716. Specifically, the user presses the record button702 in view 708, and then moves the capture device leftward in view 710.As the capture device moves leftward, object 714 appears to moverightward between views 710 and 712. In some examples, when the user isfinished recording, the record button 702 can be tapped again. In otherexamples, the user can tap and hold the record button during recording,and release to stop recording. In the present embodiment, the recordingcaptures a series of images that can be used to generate a surroundview.

According to various embodiments, once a series of images is captured,these images can be used to generate a surround view. With reference toFIG. 8, shown is an example of a surround view in whichthree-dimensional content is blended with a two-dimensional panoramiccontext. In the present example embodiment, the movement of capturedevice 820 follows a locally convex motion, such that the capture devicemoves around the object of interest (i.e., a person sitting in a chair).The object of interest is delineated as the content 808, and thesurrounding scenery (i.e., the room) is delineated as the context 810.In the present embodiment, as the movement of the capture device 820moves leftwards around the content 808, the direction of contentrotation relative to the capture device 812 is in a rightward,counterclockwise direction. Views 802, 804, and 806 show a progressionof the rotation of the person sitting in a chair relative to the room.

According to various embodiments, a series of images used to generate asurround view can be captured by a user recording a scene, object ofinterest, etc. Additionally, in some examples, multiple users cancontribute to acquiring a series of images used to generate a surroundview. With reference to FIG. 9, shown is one example of a space-timesurround view being simultaneously recorded by independent observers.

In the present example embodiment, cameras 904, 906, 908, 910, 912, and914 are positioned at different locations. In some examples, thesecameras 904, 906, 908, 910, 912, and 914 can be associated withindependent observers. For instance, the independent observers could beaudience members at a concert, show, event, etc. In other examples,cameras 904, 906, 908, 910, 912, and 914 could be placed on tripods,stands, etc. In the present embodiment, the cameras 904, 906, 908, 910,912, and 914 are used to capture views 904 a, 906 a, 908 a, 910 a, 912a, and 914 a, respectively, of an object of interest 900, with world 902providing the background scenery. The images captured by cameras 904,906, 908, 910, 912, and 914 can be aggregated and used together in asingle surround view in some examples. Each of the cameras 904, 906,908, 910, 912, and 914 provides a different vantage point relative tothe object of interest 900, so aggregating the images from thesedifferent locations provides information about different viewing anglesof the object of interest 900. In addition, cameras 904, 906, 908, 910,912, and 914 can provide a series of images from their respectivelocations over a span of time, such that the surround view generatedfrom these series of images can include temporal information and canalso indicate movement over time.

As described above with regard to various embodiments, surround viewscan be associated with a variety of capture modes. In addition, asurround view can include different capture modes or different capturemotions in the same surround view. Accordingly, surround views can beseparated into smaller parts in some examples. With reference to FIG.10, shown is one example of separation of a complex surround-view intosmaller, linear parts. In the present example, complex surround view1000 includes a capture area 1026 that follows a sweeping L motion,which includes two separate linear motions 1022 and 1024 of camera 1010.The surround views associated with these separate linear motions can bebroken down into linear surround view 1002 and linear surround view1004. It should be noted that although linear motions 1022 and 1024 canbe captured sequentially and continuously in some embodiments, theselinear motions 1022 and 1024 can also be captured in separate sessionsin other embodiments.

In the present example embodiment, linear surround view 1002 and linearsurround view 1004 can be processed independently, and joined with atransition 1006 to provide a continuous experience for the user.Breaking down motion into smaller linear components in this manner canprovide various advantages. For instance, breaking down these smallerlinear components into discrete, loadable parts can aid in compressionof the data for bandwidth purposes. Similarly, non-linear surround viewscan also be separated into discrete components. In some examples,surround views can be broken down based on local capture motion. Forexample, a complex motion may be broken down into a locally convexportion and a linear portion. In another example, a complex motion canbe broken down into separate locally convex portions. It should berecognized that any number of motions can be included in a complexsurround view 1000, and that a complex surround view 1000 can be brokendown into any number of separate portions, depending on the application.

Although in some applications, it is desirable to separate complexsurround views, in other applications it is desirable to combinemultiple surround views. With reference to FIG. 11, shown is one exampleof a graph that includes multiple surround views combined into amulti-surround view 1100. In this example, the rectangles representvarious surround views 1102, 1104, 1106, 1108, 1110, 1112, 1114, and1116, and the length of each rectangle indicates the dominant motion ofeach surround view. Lines between the surround views indicate possibletransitions 1118, 1120, 1122, 1124, 1126, 1128, 1130, and 1132 betweenthem.

In some examples, a surround view can provide a way to partition a sceneboth spatially and temporally in a very efficient manner. For very largescale scenes, multi-surround view 1100 data can be used. In particular,a multi-surround view 1100 can include a collection of surround viewsthat are connected together in a spatial graph. The individual surroundviews can be collected by a single source, such as a single user, or bymultiple sources, such as multiple users. In addition, the individualsurround views can be captured in sequence, in parallel, or totallyuncorrelated at different times. However, in order to connect theindividual surround views, there must be some overlap of content,context, or location, or of a combination of these features.Accordingly, any two surround views would need to have some overlap incontent, context, and/or location to provide a portion of amulti-surround view 1100. Individual surround views can be linked to oneanother through this overlap and stitched together to form amulti-surround view 1100. According to various examples, any combinationof capture devices with either front, back, or front and back camerascan be used.

In some embodiments, multi-surround views 1100 can be generalized tomore fully capture entire environments. Much like “photo tours” collectphotographs into a graph of discrete, spatially-neighboring components,multiple surround views can be combined into an entire scene graph. Insome examples, this can be achieved using information obtained from butnot limited to: image matching/tracking, depth matching/tracking, IMU,user input, and/or GPS. Within such a graph or multi-surround view, auser can switch between different surround views either at the endpoints of the recorded motion or wherever there is an overlap with othersurround views in the graph. One advantage of multi-surround views over“photo tours” is that a user can navigate the surround views as desiredand much more visual information can be stored in surround views. Incontrast, traditional “photo tours” typically have limited views thatcan be shown to the viewer either automatically or by allowing the userto pan through a panorama with a computer mouse or keystrokes.

According to various embodiments, a surround view is generated from aset of images. These images can be captured by a user intending toproduce a surround view or retrieved from storage, depending on theapplication. Because a surround view is not limited or restricted withrespect to a certain amount of visibility, it can provide significantlymore visual information about different views of an object or scene.More specifically, although a single viewpoint may be ambiguous toadequately describe a three-dimensional object, multiple views of theobject can provide more specific and detailed information. Thesemultiple views can provide enough information to allow a visual searchquery to yield more accurate search results. Because a surround viewprovides views from many sides of an object, distinctive views that areappropriate for search can be selected from the surround view orrequested from a user if a distinctive view is not available. Forinstance, if the data captured or otherwise provided is not sufficientto allow recognition or generation of the object or scene of interestwith a sufficiently high certainty, a capturing system can guide a userto continue moving the capturing device or provide additional imagedata. In particular embodiments, if a surround view is determined toneed additional views to produce a more accurate model, a user may beprompted to provide additional images.

With reference to FIG. 12, shown is one example of a process forprompting a user for additional images 1200 to provide a more accuratesurround view. In the present example, images are received from acapturing device or storage at 1202. Next, a determination is madewhether the images provided are sufficient to allow recognition of anobject of interest at 1204. If the images are not sufficient to allowrecognition of an object of interest, then a prompt is given for theuser to provide additional image(s) from different viewing angles at1206. In some examples, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingone or more particular viewing angles. If the user is actively capturingimages, the user can be prompted when a distinct viewing angle isdetected in some instances. According to various embodiments,suggestions to provide one or more particular viewing angles can bedetermined based on the locations associated with the images alreadyreceived. In addition, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingusing a particular capture mode such as a locally concave surround view,a locally convex surround view, or a locally flat surround view,depending on the application.

Next, the system receives these additional image(s) from the user at1208. Once the additional images are received, a determination is madeagain whether the images are sufficient to allow recognition of anobject of interest. This process continues until a determination is madethat the images are sufficient to allow recognition of an object ofinterest. In some embodiments, the process can end at this point and asurround view can be generated.

Optionally, once a determination is made that the images are sufficientto allow recognition of an object of interest, then a determination canthen be made whether the images are sufficient to distinguish the objectof interest from similar but non-matching items at 1210. Thisdetermination can be helpful especially when using visual search,examples of which are described in more detail below with regards toFIGS. 19-22. In particular, an object of interest may havedistinguishing features that can be seen from particular angles thatrequire additional views. For instance, a portrait of a person may notsufficiently show the person's hairstyle if only pictures are taken fromthe front angles. Additional pictures of the back of the person may needto be provided to determine whether the person has short hair or just apulled-back hairstyle. In another example, a picture of a person wearinga shirt might warrant additional prompting if it is plain on one sideand additional views would show prints or other insignia on the sleevesor back, etc.

In some examples, determining that the images are not sufficient todistinguish the object of interest from similar but non-matching itemsincludes determining that the number of matching search results exceedsa predetermined threshold. In particular, if a large number of searchresults are found, then it can be determined that additional views maybe needed to narrow the search criteria. For instance, if a search of amug yields a large number of matches, such as more than 20, thenadditional views of the mug may be needed to prune the search results.

If the images are not sufficient to distinguish the object of interestfrom similar but non-matching items at 1210, then a prompt is given forthe user to provide additional image(s) from different viewing angles at1212. In some examples, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingone or more particular viewing angles. If the user is actively capturingimages, the user can be prompted when a distinct viewing angle isdetected in some instances. According to various embodiments,suggestions to provide one or more particular viewing angles can bedetermined based on the locations associated with the images alreadyreceived. In addition, prompting a user to provide one or moreadditional images from different viewing angles can include suggestingusing a particular capture mode such as a locally concave surround view,a locally convex surround view, or a locally flat surround view,depending on the application.

Next, the system receives these additional image(s) from the user at1214. Once the additional images are received, a determination is madeagain whether the images are sufficient to distinguish the object ofinterest from similar but non-matching items. This process continuesuntil a determination is made that the images are sufficient todistinguish the object of interest from similar but non-matching items.Next, the process ends and a surround view can be generated from theimages.

With reference to FIGS. 13A-13B, shown are examples of promptsrequesting additional images from a user in order to produce a moreaccurate surround view. In particular, a device 1300 is shown with asearch screen. In FIG. 13A, an example of a visual search query 1302 isprovided. This visual search query 1302 includes an image of a whitemug. The results 1306 include various mugs with a white background. Inparticular embodiments, if a large amount of search results is found, aprompt 1304 can be provided to request additional image data from theuser for the search query.

In FIG. 13B, an example of another visual search query 1310 is providedin response to prompt 1304 in FIG. 13A. This visual search query 1310provides a different viewpoint of the object and provides more specificinformation about the graphics on the mug. This visual search query 1310yields new results 1312 that are more targeted and accurate. In someexamples, an additional prompt 1308 can be provided to notify the userthat the search is complete.

Once a surround view is generated, it can be used in variousapplications, in particular embodiments. One application for a surroundview includes allowing a user to navigate a surround view or otherwiseinteract with it. According to various embodiments, a surround view isdesigned to simulate the feeling of being physically present in a sceneas the user interacts with the surround view. This experience dependsnot only on the viewing angle of the camera, but on the type of surroundview that is being viewed. Although a surround view does not need tohave a specific fixed geometry overall, different types of geometriescan be represented over a local segment of a surround view such as aconcave, convex, and flat surround view, in particular embodiments.

In particular example embodiments, the mode of navigation is informed bythe type of geometry represented in a surround view. For instance, withconcave surround views, the act of rotating a device (such as asmartphone, etc.) can mimic that of rotating a stationary observer whois looking out at a surrounding scene. In some applications, swiping thescreen in one direction can cause the view to rotate in the oppositedirection. This effect is akin to having a user stand inside a hollowcylinder and pushing its walls to rotate around the user. In otherexamples with convex surround views, rotating the device can cause theview to orbit in the direction it is leaning into, such that the objectof interest remains centered. In some applications, swiping the screenin one direction causes the viewing angle to rotate in the samedirection: this creates the sensation of rotating the object of interestabout its axis or having the user rotate around the object. In someexamples with flat views, rotating or moving a device can cause the viewto translate in the direction of the device's movement. In addition,swiping the screen in one direction can cause the view to translate inthe opposite direction, as if pushing foreground objects to the side.

In some examples, a user may be able to navigate a multi-surround viewor a graph of surround views in which individual surround views can beloaded piece by piece and further surround views may be loaded whennecessary (e.g. when they are adjacent to/overlap the current surroundview and/or the user navigates towards them). If the user reaches apoint in a surround view where two or more surround views overlap, theuser can select which of those overlapping surround views to follow. Insome instances, the selection of which surround view to follow can bebased on the direction the user swipes or moves the device.

With reference to FIG. 14, shown is one example of a process fornavigating a surround view 1400. In the present example, a request isreceived from a user to view an object of interest in a surround view at1402. According to various embodiments, the request can also be ageneric request to view a surround view without a particular object ofinterest, such as when viewing a landscape or panoramic view. Next, athree-dimensional model of the object is accessed at 1404. Thisthree-dimensional model can include all or a portion of a storedsurround view. For instance, the three-dimensional model can be asegmented content view in some applications. An initial image is thensent from a first viewpoint to an output device at 1406. This firstviewpoint serves as a starting point for viewing the surround view onthe output device.

In the present embodiment, a user action is then received to view theobject of interest from a second viewpoint. This user action can includemoving (e.g. tilting, translating, rotating, etc.) an input device,swiping the screen, etc., depending on the application. For instance,the user action can correspond to motion associated with a locallyconcave surround view, a locally convex surround view, or a locally flatsurround view, etc. Based on the characteristics of the user action, thethree-dimensional model is processed at 1410. For instance, movement ofthe input device can be detected and a corresponding viewpoint of theobject of interest can be found. Depending on the application, the inputdevice and output device can both be included in a mobile device, etc.In some examples, the requested image corresponds to an image capturedprior to generation of the surround view. In other examples therequested image is generated based on the three-dimensional model (e.g.by interpolation, etc.). An image from this viewpoint can be sent to theoutput device at 1412. In some embodiments, the selected image can beprovided to the output device along with a degree of certainty as to theaccuracy of the selected image. For instance, when interpolationalgorithms are used to generate an image from a particular viewpoint,the degree of certainty can vary and may be provided to a user in someapplications. In other examples, a message can be provided to the outputdevice indicating if there is insufficient information in the surroundview to provide the requested images.

In some embodiments, intermediate images can be sent between the initialimage at 1406 and the requested image at 1412. In particular, theseintermediate images can correspond to viewpoints located between a firstviewpoint associated with the initial image and a second viewpointassociated with the requested image. Furthermore, these intermediateimages can be selected based on the characteristics of the user action.For instance, the intermediate images can follow the path of movement ofthe input device associated with the user action, such that theintermediate images provide a visual navigation of the object ofinterest.

With reference to FIG. 15, shown is an example of swipe-based navigationof a surround view. In the present example, three views of device 1500are shown as a user navigates a surround view. In particular, the input1510 is a swipe by the user on the screen of device 1500. As the userswipes from right to left, the object of interest moves relative to thedirection of swipe 1508. Specifically, as shown by the progression ofimages 1506, 1504, and 1502, the input 1510 allows the user to rotatearound the object of interest (i.e., the man wearing sunglasses).

In the present example, a swipe on a device screen can correspond torotation of a virtual view. However, other input modes can be used inother example embodiments. For instance, a surround view can also benavigated by tilting a device in various directions and using the deviceorientation direction to guide the navigation in the surround view. Inanother example, the navigation can also be based on movement of thescreen by the user. Accordingly, a sweeping motion can allow the user tosee around the surround view as if the viewer were pointing the deviceat the object of interest. In yet another example, a website can be usedto provide interaction with the surround view in a web-browser. In thisexample, swipe and/or motion sensors may be unavailable, and can bereplaced by interaction with a mouse or other cursor or input device.

According to various embodiments, surround views can be stored andaccessed in various ways. In addition, surround views can be used inmany applications. With reference to FIG. 16A, shown are examples of asharing service for surround views on a mobile device 1602 and browser1604. The mobile device 1602 and browser 1604 are shown as alternatethumbnail displays 1600, because the surround views can be accessed byeither interface, depending on the application. According to variousembodiments, a set of surround views can be presented to a user indifferent ways, including but not limited to: a gallery, a feed, and/ora website. For instance, a gallery can be used to present a collectionof thumbnails to a user. These thumbnails can be selected from thesurround views either by the user or automatically. In some examples,the size of the thumbnails can vary based on characteristics such as,but not limited to: an automatically selected size that is based on thestructure and size of the content it contains; and/or the popularity ofthe surround view. In another example, a feed can be used to presentsurround views using interactive thumbnails.

In the present example, surround view thumbnails from a mobile device1602 include thumbnails 1604 and title/label/description 1604. Thethumbnails 1604 can include an image from the surround view. Thetitle/label/description 1604 can include information about the surroundview such as title, file name, description of the content, labels, tags,etc.

Furthermore, in the present example, surround view thumbnails from abrowser 1604 include thumbnails 1606, title/label/description 1608, andnotifications 1610. The thumbnails 1606 can include an image from thesurround view. The title/label/description 1608 can include informationabout the surround view such as title, file name, description of thecontent, labels, tags, etc. In addition, notifications 1610 can includeinformation such as comments on a surround view, updates about matchingcontent, suggested content, etc. Although not shown on the mobileversion, notifications can also be included, but may be omitted in theinterest of layout and space considerations in some embodiments. In someexamples, notifications can be provided as part of a surround viewapplication on a mobile device.

With reference to FIG. 16B, shown are examples of surround view-relatednotifications on a mobile device. In particular, alternativenotification screens 1620 for a device 1622 are shown that includedifferent formats for notifications. In some examples, a user cannavigate between these screens depending on the user's preferences.

In the present example, screen 1624 includes a notification 1626 thatincludes a recommendation to the user based on content from recentsurround views. In particular, the recommendation relates to a trip toGreece based on the application's finding that the user has an affinityfor statues. This finding can be inferred from content found in theuser's stored or recently browsed surround views, in some examples.

In the present example, screen 1628 includes notifications 1630 based oncontent from surround views that the user has stored, browsed, etc. Forinstance, one notification is a recommendation for a pair of shoesavailable at a nearby retailer that are similar to the user's shoes asprovided in a surround view model. The recommendation also includes alink to a map to the retailer. This recommendation can be based on asurround view that the user has saved of a pair of shoes. The othernotification is a recommendation to connect to another user that sharesa common interest/hobby. In this example, the recommendation is based onthe user's detected interest in hats. These recommendations can beprovided automatically in some applications as “push” notifications. Thecontent of the recommendations can be based on the user's surround viewsor browsing history, and visual search algorithms, such as thosedescribed with regard to FIGS. 19-22, can be used in some examples.

Screen 1630 shows another form of notification 1632 in the presentexample. Various icons for different applications are featured on screen1630. The icon for the surround view application includes a notification1632 embedded into the icon that shows how many notifications arewaiting for the user. When the user selects the icon, the notificationscan be displayed and/or the application can be launched, according tovarious embodiments.

According to various embodiments of the present disclosure, surroundviews can be used to segment, or separate, objects from static ordynamic scenes. Because surround views include distinctive 3D modelingcharacteristics and information derived from image data, surround viewsprovide a unique opportunity for segmentation. In some examples, bytreating an object of interest as the surround view content, andexpressing the remaining of the scene as the context, the object can besegmented out and treated as a separate entity. Additionally, thesurround view context can be used to refine the segmentation process insome instances. In various embodiments, the content can be chosen eitherautomatically or semi-automatically using user guided interaction. Oneimportant use for surround view object segmentation is in the context ofproduct showcases in e-commerce, an example of which is shown in FIG.17B. In addition, surround view-based object segmentation can be used togenerate object models that are suited for training artificialintelligence search algorithms that can operate on large databases, inthe context of visual search applications.

With reference to FIG. 17, shown is one example of a process forproviding object segmentation 1700. At 1702, a first surround view of anobject is obtained. Next, content is selected from the first surroundview at 1704. In some examples, the content is selected automaticallywithout user input. In other examples, the content is selectedsemi-automatically using user-guided interaction. The content is thensegmented from the first surround view at 1706. In some examples, thecontent is segmented by reconstructing a model of the content inthree-dimensions based on the information provided in the first surroundview, including images from multiple camera viewpoints. In particularexample embodiments, a mechanism for selecting and initializing asegmentation algorithm based on iterative optimization algorithms (suchas graphical models) can be efficiently employed by reconstructing theobject of interest, or parts of it, in three-dimensions from multiplecamera viewpoints available in a surround view. This process can berepeated over multiple frames, and optimized until segmentation reachesa desired quality output. In addition, segmenting the content caninclude using the context to determine parameters of the content.

In the present example, once the content is segmented from the firstsurround view, a second surround view is generated that includes theobject without the content or scenery surrounding the object. At 1708,this second surround view is provided. In some examples, the secondsurround view can then be stored in a database. This second surroundview can be used in various applications. For instance, the segmentedcontent includes a product for use in e-commerce. As illustrated in FIG.17B, the segmented content can be used to show a product from variousviewpoints. Another application includes using the second surround viewas an object model for artificial intelligence training. In yet anotherapplication, the second surround view can be used in 3D printing. Inthis application, data from the second surround view is to a 3D printer.

Although the present example describes segmenting out content from afirst surround view, it should be noted that context can also besegmented out in other examples. For instance, the background scenerycan be segmented out and presented as a second surround view in someapplications. In particular, the context can be selected from the firstsurround view and the context can be segmented from the first surroundview, such that the context is separated into a distinct interactivemodel. The resulting surround view would then include the scenerysurrounding an object but exclude the object itself. A segmented contextmodel can also be used in various applications. For instance, data fromthe resulting surround view can be sent to a 3D printer. In someexamples, this could be printed as a panoramic background on a flat orcurved surface. If a context model is also printed, then the object ofinterest can be placed in front of the panoramic background to produce athree-dimensional “photograph” or model of the surround view. In anotherapplication, the segmented out context can be used as background to adifferent object of interest. Alternatively, a segmented out content canbe placed in a new segmented out context. In these examples, providingan alternative content or context allows objects of interest to beplaced into new backgrounds, etc. For instance, a surround view of aperson could be placed in various background contexts, showing theperson standing on a beach in one surround view, and standing in thesnow in another surround view.

With reference to FIG. 17B, shown is one example of a segmented objectviewed from different angles. In particular, a rotational view 1720 isshown of an athletic shoe. Object views 1722, 1724, 1726, 1728, and 1730show the athletic shoe from various angles or viewpoints. As shown, theobject itself is shown without any background or context. According tovarious embodiments, these different views of the segmented object canbe automatically obtained from surround view content. One application ofthese types of rotational views is in e-commerce to show product viewsfrom different angles. Another application can be in visual search,according to various embodiments.

According to various embodiments, surround views can be generated fromdata obtained from various sources and can be used in numerousapplications. With reference to FIG. 18, shown is a block diagramillustrating one example of various sources that can be used forsurround view generation and various applications that can be used witha surround view. In the present example, surround view generation andapplications 1800 includes sources for image data 1808 such as internetgalleries 1802, repositories 1804, and users 1806. In particular, therepositories can include databases, hard drives, storage devices, etc.In addition, users 1806 can include images and information obtaineddirectly from users such as during image capture on a smartphone, etc.Although these particular examples of data sources are indicated, datacan be obtained from other sources as well. This information can begathered as image data 1808 to generate a surround view 1810, inparticular embodiments.

In the present example, a surround view 1810 can be used in variousapplications. As shown, a surround view can be used in applications suchas e-commerce 1812, visual search 1814, 3D printing 1816, file sharing1818, user interaction 1820, and entertainment 1822. Of course, thislist is only illustrative, and surround views can also be used in otherapplications not explicitly noted.

As described above with regard to segmentation, surround views can beused in e-commerce 1812. For instance, surround views can be used toallow shoppers to view a product from various angles. In someapplications, shoppers can even use surround views to determine sizing,dimensions, and fit. In particular, a shopper can provide a self-modeland determine from surround views whether the product would fit themodel. Surround views can also be used in visual search 1814 asdescribed in more detail below with regard to FIGS. 19-22. Some of thevisual search applications can also relate to e-commerce, such as when auser is trying to find a particular product that matches a visual searchquery.

Another application of segmentation includes three-dimensional printing(3D printing) 1816. Three-dimensional printing has been recentlyidentified as one of the future disruptive technologies that willimprove the global economy in the next decade. According to variousembodiments, content can be 3D printed from a surround view. Inaddition, the panoramic background context in a surround view can alsobe printed. In some examples, a printed background context cancomplement the final 3D printed product for users that would like topreserve memories in a 3D printed format. For instance, the contextcould be printed either as a flat plane sitting behind the 3D content,or as any other geometric shape (spherical, cylindrical, U shape, etc).

As described above with regard to FIG. 16A, surround views can be storedwith thumbnail views for user access. This type of application can beused for file sharing 1818 between users in some examples. For instance,a site can include infrastructure for users to share surround views in amanner similar to current photo sharing sites. File sharing 1818 canalso be implemented directly between users in some applications.

Also as described with regard to FIGS. 14 and 15, user interaction isanother application of surround views. In particular, a user cannavigate through a surround view for their own pleasure orentertainment. Extending this concept to entertainment 1822, surroundviews can be used in numerous ways. For instance, surround views can beused in advertisements, videos, etc.

As previously described, one application of surround views is visualsearch. FIGS. 19, 20, and 22 depict examples of visual search usingsurround views. According to various embodiments, using surround viewscan provide much higher discriminative power in search results than anyother digital media representation to date. In particular, the abilityto separate content and context in a surround view is an importantaspect that can be used in visual search.

Existing digital media formats such as 2D images are unsuitable forindexing, in the sense that they do not have enough discriminativeinformation available natively. As a result, many billions of dollarsare spent in research on algorithms and mechanisms for extracting suchinformation from them. This has resulted in satisfactory results forsome problems, such as facial recognition, but in general the problem offiguring out a 3D shape from a single image is ill-posed in existingtechnologies. Although the level of false positives and negatives can bereduced by using sequences of images or 2D videos, the 3D spatialreconstruction methods previously available are still inadequate.

According to various embodiments, additional data sources such aslocation-based information, which are used to generate surround views,provide valuable information that improves the capability of visualrecognition and search. In particular example embodiments, twocomponents of a surround view, the context and the content, bothcontribute significantly in the visual recognition process. Inparticular example embodiments, the availability of three-dimensionalinformation that the content offers can significantly reduce the numberof hypotheses that must be evaluated to recognize a query object or partof a scene. According to various embodiments, the content'sthree-dimensional information can help with categorization (i.e.,figuring out the general category that an object belongs to), and thetwo-dimensional texture information can indicate more about a specificinstance of the object. In many cases, the context information in asurround view can also aid in the categorization of a query object, byexplaining the type of scene in which the query object is located.

In addition to providing information that can be used to find a specificinstance of an object, surround views are also natively suited foranswering questions such as: “what other objects are similar in shapeand appearance?” Similar to the top-N best matches provided in responseto a web search query, a surround view can be used with objectcategorization and recognition algorithms to indicate the “closestmatches,” in various examples.

Visual search using surround views can be used and/or implemented invarious ways. In one example, visual search using surround views can beused in object recognition for robotics. In another example, visualsearch using surround views can be used in social media curation. Inparticular, by analyzing the surround view data being posted to varioussocial networks, and recognizing objects and parts of scenes, better#hashtags indices can be automatically generated. By generating thistype of information, feeds can be curated and the search experience canbe enhanced.

Another example in which visual search using surround views can be usedis in a shopping context that can be referred to as “Search and Shop.”In particular, this visual search can allow recognition of items thatare similar in shape and appearance, but might be sold at differentprices in other stores nearby. For instance, with reference to FIG. 21,a visual search query may yield similar products available for purchase.

In yet another example in which visual search using surround views canbe used is in a shopping context that can be referred to as “Search andFit.” According to various embodiments, because surround view content isthree-dimensional, precise measurements can be extracted and thisinformation can be used to determine whether a particular objectrepresented in a surround view would fit in a certain context (e.g., ashoe fitting a foot, a lamp fitting a room, etc).

In another instance, visual search using surround views can also be usedto provide better marketing recommendation engines. For example, byanalyzing the types of objects that appear in surround views generatedby various users, questions such as “what type of products do peoplereally use in their daily lives” can be answered in a natural, private,and non-intrusive way. Gathering this type of information can facilitateimproved recommendation engines, decrease and/or stop unwanted spam ormarketing ads, thereby increasing the quality of life of most users.FIG. 16B shows one implementation in which recommendations can beprovided according to various embodiments of the present invention.

With reference to FIG. 19, shown is one example of a process forproviding visual search of an object 1900, where the search queryincludes a surround view of the object and the data searched includesthree-dimensional models. At 1902, a visual search query that includes afirst surround view is received. This first surround view is thencompared to stored surround views at 1904. In some embodiments, thiscomparison can include extracting first measurement information for theobject in the first surround view and comparing it to second measurementinformation extracted from the one or more stored surround views. Forinstance, this type of measurement information can be used for searchingitems such as clothing, shoes, or accessories.

Next, a determination is made whether any stored surround viewscorrespond to the first surround view at 1906. In some examples, thisdetermination is based on whether the subject matter in any of thestored surround views is similar in shape to the object in the firstsurround view. In other examples, this determination is based on whetherany of the subject matter in the stored surround views is similar inappearance to the object in the first surround view. In yet otherexamples, this determination is based on whether any subject matter inthe stored surround views include similar textures included in the firstsurround view. In some instances, this determination is based on whetherany of the contexts associated with the stored surround views match thecontext of the first surround view. In another example, thisdetermination is based on whether the measurement information associatedwith a stored surround view dimensionally fits the object associatedwith the first surround view. Of course any of these bases can be usedin conjunction with each other.

Once this determination is made, a ranked list of matching results isgenerated at 1908. In some embodiments, generating a ranked list ofmatching results includes indicating how closely any of the storedsurround views dimensionally fits the object associated with the firstmeasurement information. According to various embodiments, this rankedlist can include displaying thumbnails of matching results. In someexamples, links to retailers can be included with the thumbnails.Additionally, information about the matching results such as name,brand, price, sources, etc. can be included in some applications.

Although the previous example includes using a surround view as a visualsearch query to search through stored surround views orthree-dimensional models, current infrastructure still includes a vaststore of two-dimensional images. For instance, the internet providesaccess to numerous two-dimensional images that are easily accessible.Accordingly, using a surround view to search through storedtwo-dimensional images for matches can provide a useful application ofsurround views with the current two-dimensional infrastructure.

With reference to FIG. 20, shown is one example of a process forproviding visual search of an object 2000, where the search queryincludes a surround view of the object and the data searched includestwo-dimensional images. At 2002, a visual search query that includes afirst surround view is received. Next, object view(s) are selected fromthe surround view at 2004. In particular, one or more two-dimensionalimages are selected from the surround view. Because these object view(s)will be compared to two-dimensional stored images, selecting multipleviews can increase the odds of finding a match. Furthermore, selectingone or more object views from the surround view can include selectingobject views that provide recognition of distinctive characteristics ofthe object.

In the present example, the object view(s) are then compared to storedimages at 2006. In some embodiments, one or more of the stored imagescan be extracted from stored surround views. These stored surround viewscan be retrieved from a database in some examples. In various examples,comparing the one or more object views to the stored images includescomparing the shape of the object in the surround view to the storedimages. In other examples, comparing the one or more object views to thestored images includes comparing the appearance of the object in thesurround view to the stored images. Furthermore, comparing the one ormore object views to the stored images can include comparing the textureof the object in the surround view to the stored images. In someembodiments, comparing the one or more object views to the stored imagesincludes comparing the context of the object in the surround view to thestored images. Of course any of these criteria for comparison can beused in conjunction with each other.

Next, a determination is made whether any stored images correspond tothe object view(s) at 2008. Once this determination is made, a rankedlist of matching results is generated at 2010. According to variousembodiments, this ranked list can include displaying thumbnails ofmatching results. In some examples, links to retailers can be includedwith the thumbnails. Additionally, information about the matchingresults such as name, brand, price, sources, etc. can be included insome applications.

With reference to FIG. 21, shown is an example of a visual searchprocess 2100. In the present example, images are obtained at 2102. Theseimages can be captured by a user or pulled from stored files. Next,according to various embodiments, a surround view is generated based onthe images. This surround view is then used as a visual search querythat is submitted at 2104. In this example, a surround view can be usedto answer questions such as “which other objects in a database look likethe query object.” As illustrated, surround views can help shift thevisual search paradigm from finding other “images that look like thequery,” to finding other “objects that look like the query,” due totheir better semantic information capabilities. As described with regardto FIGS. 19 and 20 above, the surround view can then be compared to thestored surround views or images and a list of matching results can beprovided at 2106.

Although the previous examples of visual search include using surroundviews as search queries, it may also be useful to provide search queriesfor two-dimensional images in some embodiments. With reference to FIG.22, shown is an example of a process for providing visual search of anobject 2200, where the search query includes a two-dimensional view ofthe object and the data searched includes surround view(s). At 2202, avisual search query that includes a two-dimensional view of an object tobe searched is received. In some examples, the two-dimensional view isobtained from an object surround view, wherein the object surround viewincludes a three-dimensional model of the object. Next, thetwo-dimensional view is compared to surround views at 2204. In someexamples, the two-dimensional view can be compared to one or morecontent views in the surround views. In particular, the two-dimensionalview can be compared to one or more two-dimensional images extractedfrom the surround views from different viewing angles. According tovarious examples, the two-dimensional images extracted from the surroundviews correspond to viewing angles that provide recognition ofdistinctive characteristics of the content. In other examples, comparingthe two-dimensional view to one or more surround views includescomparing the two-dimensional view to one or more content models.Various criteria can be used to compare the images or models such as theshape, appearance, texture, and context of the object. Of course any ofthese criteria for comparison can be used in conjunction with eachother.

With reference to FIG. 23, shown is a particular example of a computersystem that can be used to implement particular examples of the presentinvention. For instance, the computer system 2300 can be used to providesurround views according to various embodiments described above.According to particular example embodiments, a system 2300 suitable forimplementing particular embodiments of the present invention includes aprocessor 2301, a memory 2303, an interface 2311, and a bus 2315 (e.g.,a PCI bus). The interface 2311 may include separate input and outputinterfaces, or may be a unified interface supporting both operations.When acting under the control of appropriate software or firmware, theprocessor 2301 is responsible for such tasks such as optimization.Various specially configured devices can also be used in place of aprocessor 2301 or in addition to processor 2301. The completeimplementation can also be done in custom hardware. The interface 2311is typically configured to send and receive data packets or datasegments over a network. Particular examples of interfaces the devicesupports include Ethernet interfaces, frame relay interfaces, cableinterfaces, DSL interfaces, token ring interfaces, and the like.

In addition, various very high-speed interfaces may be provided such asfast Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces,HSSI interfaces, POS interfaces, FDDI interfaces and the like.Generally, these interfaces may include ports appropriate forcommunication with the appropriate media. In some cases, they may alsoinclude an independent processor and, in some instances, volatile RAM.The independent processors may control such communications intensivetasks as packet switching, media control and management.

According to particular example embodiments, the system 2300 uses memory2303 to store data and program instructions and maintained a local sidecache. The program instructions may control the operation of anoperating system and/or one or more applications, for example. Thememory or memories may also be configured to store received metadata andbatch requested metadata.

Because such information and program instructions may be employed toimplement the systems/methods described herein, the present inventionrelates to tangible, machine readable media that include programinstructions, state information, etc. for performing various operationsdescribed herein. Examples of machine-readable media include hard disks,floppy disks, magnetic tape, optical media such as CD-ROM disks andDVDs; magneto-optical media such as optical disks, and hardware devicesthat are specially configured to store and perform program instructions,such as read-only memory devices (ROM) and programmable read-only memorydevices (PROMs). Examples of program instructions include both machinecode, such as produced by a compiler, and files containing higher levelcode that may be executed by the computer using an interpreter.

Although many of the components and processes are described above in thesingular for convenience, it will be appreciated by one of skill in theart that multiple components and repeated processes can also be used topractice the techniques of the present disclosure.

While the present disclosure has been particularly shown and describedwith reference to specific embodiments thereof, it will be understood bythose skilled in the art that changes in the form and details of thedisclosed embodiments may be made without departing from the spirit orscope of the invention. It is therefore intended that the invention beinterpreted to include all variations and equivalents that fall withinthe true spirit and scope of the present invention.

What is claimed is:
 1. A method comprising; receiving a visual searchquery, wherein the visual search query includes a first surround view ofan object to be searched; comparing the first surround view to one ormore stored surround views; determining whether any of the storedsurround views correspond to the first surround view; generating aranked list of matching results, wherein the ranked list is based on howclosely the one or more stored surround views correspond to the firstsurround view included in the visual search query.
 2. The method ofclaim 1, wherein determining whether any of the stored surround viewscorrespond to the first surround view includes determining whethersubject matter in any of the stored surround views are similar in shapeto the object in the first surround view.
 3. The method of claim 1,wherein determining whether any of the stored surround views correspondto the first surround view includes determining whether any of thesubject matter in the stored surround views are similar in appearance tothe object in the first surround view.
 4. The method of claim 1, whereindetermining whether any of the stored surround views correspond to thefirst surround view includes determining whether any subject matter inthe stored surround views include similar textures included in the firstsurround view.
 5. The method of claim 1, wherein the first surround viewincludes a first context, wherein the first context includes scenerysurrounding the object, and wherein determining whether any of thestored surround views correspond to the first surround view includesdetermining whether any of the surround views includes a second contextthat matches the first context.
 6. The method of claim 1, whereincomparing the first surround view to one or more stored surround viewsincludes extracting first measurement information for the object in thefirst surround view, extracting second measurement information for theone or more stored surround views, and comparing the first measurementinformation to the second measurement information.
 7. The method ofclaim 6, wherein determining whether any of the stored surround viewscorrespond to the first surround view includes determining whether thesecond measurement information dimensionally fits the object associatedwith the first measurement information.
 8. A method comprising;receiving a visual search query, wherein the visual search queryincludes a surround view of an object to be searched; selecting one ormore object views from the surround view, wherein the object viewscorrespond to two-dimensional representations of the object fromdifferent viewing angles in the surround view; comparing the one or moreobject views to a plurality of stored images; determining whether any ofthe stored images correspond to the one or more object views; generatinga ranked list of matching results, wherein the ranked list is based onhow closely the stored images correspond to the object views of thesurround view included in the visual search query.
 9. The method ofclaim 8, wherein one or more of the plurality of stored images areincluded in one or more stored surround views.
 10. The method of claim8, wherein the stored images are extracted from stored surround views.11. The method of claim 10, wherein the stored surround views are storedin a database.
 12. The method of claim 8, wherein comparing the one ormore object views to a plurality of stored images includes comparing theshape of the object in the surround view to the plurality of storedimages.
 13. The method of claim 8, wherein comparing the one or moreobject views to a plurality of stored images includes comparing theappearance of the object in the surround view to the plurality of storedimages.
 14. The method of claim 8, wherein comparing the one or moreobject views to a plurality of stored images includes comparing thetexture of the object in the surround view to the plurality of storedimages.
 15. The method of claim 8, wherein comparing the one or moreobject views to a plurality of stored images includes comparing thecontext of the object in the surround view to the plurality of storedimages.
 16. A method comprising; receiving a visual search query,wherein the visual search query includes a two-dimensional view of anobject to be searched; comparing the two-dimensional view to one or moresurround views, wherein the surround views include multi-viewinteractive digital media representations of content within the surroundviews; determining whether any of the surround views correspond to thetwo-dimensional view; generating a ranked list of matching results,wherein the ranked list is based on how closely the surround viewscorrespond to the two-dimensional view of the object included in thevisual search query.
 17. The method of claim 16, wherein comparing thetwo-dimensional view to one or more surround views includes comparingthe two-dimensional view to one or more content views in the surroundviews.
 18. The method of claim 16, wherein comparing the two-dimensionalview to one or more surround views includes comparing thetwo-dimensional view to one or more two-dimensional images extractedfrom the surround views from different viewing angles.
 19. The method ofclaim 18, wherein the two-dimensional images extracted from the surroundviews correspond to viewing angles that provide recognition ofdistinctive characteristics of the content.
 20. The method of claim 16,wherein the two-dimensional view of an object to be searched is obtainedfrom an object surround view, wherein the object surround view includesa three-dimensional model of the object.